This report was created the using the behavseqanalyser v0.1.1-alpha, behaviour were grouped using the MITsoft categorisation (date: 2018-01-31).
The data is grouped by treatment. Data transformation: data (%age of time spent doing the behavior) transformed using the square root method..
## first second
## 11 11
We grouped the variables following the MITsoft argument to get 11 behavior categories. We used the folowing time windows and got 10 x 8 = 80 variables :
| time_reference | windowstart | windowend | windowname |
|---|---|---|---|
| Bin | 0 | 120 | first 2 hours of recording |
| Bintodark | -120 | 0 | last 2h before night |
| Bintodark | 0 | 180 | first 3h of night |
| Bintodark | 540 | 720 | late night (3h) |
| Bintodark | 720 | 864 | early day(3h) |
| Bintodark | -120 | 864 | full recording |
| lightcondition | DAY | NA | daytime |
| lightcondition | NIGHT | NA | nighttime |
Note that the last window might be truncated if not all dataset is achieving 900 min after light on.
We then run a random forest to get the variables in order of importance to distinguish the groups. We then take the best 20 and run the random forest again (such that the Gini scores obtained will not depend on the initial number of variables). We plot here the table of variables ordered by weight:
Let’s take a teshold of importance (Gini > 0.95) and get all variables satisfying the filter, or at least 8 variables:
Groom4, Drink6, Drink8, Eat1, Eat6, Drink7, Drink2 and Eat4
First, lets plot the 2 most discriminative variables following the random forest:
Plot = Multi_datainput_m [,names(Multi_datainput_m) %in% as.character(R2 [1:2,1]) ]
Plot = cbind(Multi_datainput_m$groupingvar, Plot)
Title_plot = paste0(names (Plot) [2],"x",names (Plot) [3])
names (Plot) = c("groupingvar","disciminant1", "discriminant2")
p=ggplot (Plot, aes (y= disciminant1, x=discriminant2, color= groupingvar))+
geom_point()+
labs(title = Title_plot)+
#scale_x_log10() + scale_y_log10()+
scale_colour_grey() + theme_bw()+
theme(legend.position='none')
print(p)
Here, we plot the first two or threecomponents obtained after a ICAperformed on the reduced data:
The PCA strategy shows that the behavior profile of the two groups of animal are not identical.
We performed a PCA on the data and tested whether the groups show a difference in their first component score using a Mann-Whitney or a Kruskal-Wallis rank sum test (if more than 2 groups exists). We plot here the first component in a boxplot:
NB: This strategy is pretty good against type I errors. On the other hand, it may well oversee existing differences.
We perform a SVM on the total data or the reduced data and compare the results. For that with split the data in training and test sets, tune the svm for best parameters and then run the svm and gives the overall accuracy (kappa) as the output. This accuracy (0.6666667) was tested for significance, using a permutation strategy. We performed 1 permutations. (What it does is permute the elements in random groups in the training data, tune a svm and apply it to the (non-randomised) test set, its prediction (kappa score) is saved. We use a Binomial confidence interval to calculate a p value. )
The SVM procedure could not tell the two groups apart.
Details: [1] “80 variables: Accuracy of the prediction with sigmoid kernel (Kappa index: 0 denotes chance level, maximum is 1):0.666666666666667”
distribution of the accuracy scores with permuted labels, with adding a vertical line at the Score obtained using the real groups.
P value calculation:
# Exports `binconf`
k <- sum(abs(Acc_sampled) >= abs(Accuracyreal)) # Two-tailed test
R=binconf(k, length(Acc_sampled), method='exact')
print(zapsmall(R)) # 95% CI by default
## PointEst Lower Upper
## 1 0.025 1
save.image(file= "results.rdata")